[feature](be) Add adaptive batch size for scan path (#62835) by mrhhsg · Pull Request #63005 · apache/doris

mrhhsg · 2026-05-06T06:43:02Z

Pick PR: #62835

Problem Summary: Add adaptive block row prediction for SegmentIterator, OLAP scan, file scan, and format readers. The scan path now uses a row ceiling plus preferred output byte budget to reduce oversized blocks for wide rows while preserving row-limited behavior for narrow rows. This commit also introduces the shared session/config/thrift/runtime budget plumbing used by later operators.

Adds adaptive batch size controls for scan output blocks: preferred_block_size_bytes and preferred_max_column_in_block_size_bytes.

Test: Unit Test
Unit Test: ./run-be-ut.sh --run --filter=BlockBudgetTest.:RuntimeStateBatchSizeTest.:RuntimeStateBlockSizeBytesTest.:RuntimeStateMaxColBytesTest.:MockRuntimeStateBlockBudgetTest.:AdaptiveBlockSizePredictorTest.:BlockReaderBatchMaxRowsTest.:EstimateCollectedEnoughTest.:CollectedEnoughWithColumnsTest.:BlockReaderByteBudgetTest.:SegmentColumnRawDataBytesTest.:CsvReaderSetBatchSizeTest.:NewJsonReaderSetBatchSizeTest.:OrcReaderTest.:TableFormatReaderTest.:ProfileSpecTest.:LocalExchangerTest.*
Behavior changed: Yes (scan output block sizing can now be byte-budget limited when adaptive batch size is enabled)
Does this need documentation: Yes

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

None

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

hello-stephen · 2026-05-06T06:43:07Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

mrhhsg · 2026-05-06T06:43:15Z

run buildall

mrhhsg · 2026-05-09T08:14:28Z

run buildall

hello-stephen · 2026-05-09T09:45:29Z

FE UT Coverage Report

Increment line coverage 83.33% (5/6) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-05-09T11:21:49Z

BE Regression && UT Coverage Report

Increment line coverage 86.06% (358/416) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.33% (27069/36913)
Line Coverage	56.86% (291632/512884)
Region Coverage	54.16% (242731/448210)
Branch Coverage	55.89% (105564/188893)

hello-stephen · 2026-05-09T11:35:09Z

FE Regression Coverage Report

Increment line coverage 66.67% (4/6) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-05-09T11:45:09Z

BE Regression && UT Coverage Report

Increment line coverage 85.58% (356/416) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	71.51% (26402/36922)
Line Coverage	54.44% (279470/513332)
Region Coverage	51.53% (231282/448864)
Branch Coverage	53.03% (100273/189092)

hello-stephen · 2026-05-09T11:54:12Z

BE Regression && UT Coverage Report

Increment line coverage 85.58% (356/416) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	71.50% (26398/36922)
Line Coverage	54.44% (279462/513332)
Region Coverage	51.54% (231338/448864)
Branch Coverage	53.03% (100267/189092)

hello-stephen · 2026-05-09T11:55:26Z

FE Regression Coverage Report

Increment line coverage 66.67% (4/6) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-05-09T11:59:10Z

BE Regression && UT Coverage Report

Increment line coverage 85.58% (356/416) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	71.50% (26398/36922)
Line Coverage	54.44% (279462/513332)
Region Coverage	51.54% (231338/448864)
Branch Coverage	53.03% (100267/189092)

hello-stephen · 2026-05-09T12:06:05Z

FE Regression Coverage Report

Increment line coverage 66.67% (4/6) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-05-09T12:09:00Z

FE Regression Coverage Report

Increment line coverage 66.67% (4/6) 🎉
Increment coverage report
Complete coverage report

Issue Number: None Related PR: None Problem Summary: Add adaptive block row prediction for SegmentIterator, OLAP scan, file scan, and format readers. The scan path now uses a row ceiling plus preferred output byte budget to reduce oversized blocks for wide rows while preserving row-limited behavior for narrow rows. This commit also introduces the shared session/config/thrift/runtime budget plumbing used by later operators. Adds adaptive batch size controls for scan output blocks: preferred_block_size_bytes and preferred_max_column_in_block_size_bytes. - Test: Unit Test - Unit Test: ./run-be-ut.sh --run --filter=BlockBudgetTest.*:RuntimeStateBatchSizeTest.*:RuntimeStateBlockSizeBytesTest.*:RuntimeStateMaxColBytesTest.*:MockRuntimeStateBlockBudgetTest.*:AdaptiveBlockSizePredictorTest.*:BlockReaderBatchMaxRowsTest.*:EstimateCollectedEnoughTest.*:CollectedEnoughWithColumnsTest.*:BlockReaderByteBudgetTest.*:SegmentColumnRawDataBytesTest.*:CsvReaderSetBatchSizeTest.*:NewJsonReaderSetBatchSizeTest.*:OrcReaderTest.*:TableFormatReaderTest.*:ProfileSpecTest.*:LocalExchangerTest.* - Behavior changed: Yes (scan output block sizing can now be byte-budget limited when adaptive batch size is enabled) - Does this need documentation: Yes Issue Number: close #xxx Related PR: #xxx Problem Summary: None - Test  - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason  - Behavior changed: - [ ] No. - [ ] Yes.  - Does this need documentation? - [ ] No. - [ ] Yes.  - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label  --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Issue Number: None Related PR: None Problem Summary: Cluster-key MOW compaction sorts rows by cluster key, so duplicate unique keys may be non-adjacent and can remain visible in the output rowset. Scan the output rowset primary key index after compaction and add output-rowset internal delete bitmap entries for older duplicate unique-key rows. None - Test: Unit Test - Ran ./run-be-ut.sh --run --filter=VerticalCompactionTest.ClusterKeyMowCompactionNeedsOutputRowsetInternalDedup -j 8 - Behavior changed: No - Does this need documentation: No

hello-stephen · 2026-05-10T18:21:34Z

FE Regression Coverage Report

Increment line coverage 66.67% (4/6) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-05-10T18:39:45Z

FE Regression Coverage Report

Increment line coverage 66.67% (4/6) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-05-11T08:03:46Z

BE Regression && UT Coverage Report

Increment line coverage 83.74% (546/652) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	71.46% (26384/36919)
Line Coverage	54.54% (280001/513416)
Region Coverage	51.80% (232630/449067)
Branch Coverage	53.15% (100563/189200)

github-actions · 2026-05-11T08:39:37Z

PR approved by at least one committer and no changes requested.

github-actions · 2026-05-11T08:39:40Z

PR approved by anyone and no changes requested.

hello-stephen · 2026-05-11T08:43:23Z

skip buildall

mrhhsg requested a review from yiguolei as a code owner May 6, 2026 06:43

mrhhsg force-pushed the pick_abs branch from 5c319d0 to a2b29d9 Compare May 6, 2026 07:03

mrhhsg force-pushed the pick_abs branch 2 times, most recently from 7b81191 to af17679 Compare May 10, 2026 14:02

mrhhsg force-pushed the pick_abs branch from af17679 to 16fd978 Compare May 10, 2026 15:50

yiguolei closed this May 11, 2026

yiguolei reopened this May 11, 2026

yiguolei approved these changes May 11, 2026

View reviewed changes

github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 11, 2026

github-actions Bot added the reviewed label May 11, 2026

yiguolei merged commit 83d9e70 into apache:branch-4.1 May 11, 2026
46 of 50 checks passed

Conversation

mrhhsg commented May 6, 2026

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented May 6, 2026

Uh oh!

mrhhsg commented May 6, 2026

Uh oh!

mrhhsg commented May 9, 2026

Uh oh!

hello-stephen commented May 9, 2026

FE UT Coverage Report

Uh oh!

hello-stephen commented May 9, 2026

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented May 9, 2026

FE Regression Coverage Report

Uh oh!

hello-stephen commented May 9, 2026

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented May 9, 2026

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented May 9, 2026

FE Regression Coverage Report

Uh oh!

hello-stephen commented May 9, 2026

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented May 9, 2026

FE Regression Coverage Report

Uh oh!

hello-stephen commented May 9, 2026

FE Regression Coverage Report

Uh oh!

hello-stephen commented May 10, 2026

FE Regression Coverage Report

Uh oh!

hello-stephen commented May 10, 2026

FE Regression Coverage Report

Uh oh!

hello-stephen commented May 11, 2026

BE Regression && UT Coverage Report

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

hello-stephen commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants